In this example we use R and Stan files from BuildEmulator and HistoryMatching directories to construct GP emulator and perform two consecutive waves of history matching. First we start with loading the necessary packages, functions and compilation of Stan files that are used to construct a GP emulator and perform history macthing.
setwd('..')
source('BuildEmulator/BuildEmulator.R')
source('HistoryMatching/HistoryMatching.R')
We proceed to loading the data file lmdzAWave1.RData that contains the computer model simulations. The computer model inputs are saved in columns 1 to 6 (on \([-1, 1]\) scale) of the object tData together with the simulator response saved as a last column of a data frame.
The values of observations, tObs, and the variance of the observation errors, tObsErr, are saved in the data file lmdzLES.RData
load('../Demonstrations/lmdzAWave1.RData')
head(tData)
## thermals_fact_epsilon thermals_betalpha thermals_detr_min
## 1 0.4915765 0.1732043 0.05777618
## 2 0.1090608 -0.7560815 0.42673494
## 3 0.7168394 0.5719379 0.78002271
## 4 -0.1596120 0.8538115 -0.36083351
## 5 -0.2419164 0.7978461 0.52854255
## 6 0.8185074 -0.5528027 -0.71106412
## thermals_entr_min thermals_afact thermals_ed_dz Noise SCMdata
## 1 -0.5929630 0.9297591 -0.69162746 0.3218885 305.1078
## 2 -0.4541389 0.6648306 0.62294878 -0.2098504 304.9819
## 3 0.6277340 0.5190513 -0.01550968 0.6968593 304.7632
## 4 0.2290768 0.7128034 0.53940245 0.3559761 305.0042
## 5 -0.8571062 0.1509087 -0.10689711 0.2144832 305.1566
## 6 -0.0807132 0.2156313 -0.39375864 1.1074743 304.8360
load('../Demonstrations/lmdzLES.RData')
print(tObs)
## [1] 305.4063
print(tObsErr)
## [1] 0.000320935
We repeat the same steps decribed in emulate-lmdz.Rmd to construct and validate a GP emulator (please see the relevant R Markdown for more details).
Specify a number and a vector of metrics of interest (computer model output), a number and vector of candidates (potential emulator inputs).
metric = c('SCMdata')
nmetric = length(metric)
cands = names(tData)[-which(names(tData) == metric)]
param = cands[-which(cands == "Noise")]
nparam = length(param)
We produce scatter plots to visualize the model response behaviour against model inputs standardized on \([-1, 1]\) scale via scatter plot.
if(nparam*nmetric<2){ par(mfrow = c(1, 1), mar=c(4, 4, 1, 1))
} else if(nparam*nmetric <= 4){ par(mfrow = c(2, 2), mar=c(4, 4, 1, 1))
} else if(nparam*nmetric <= 9){ par(mfrow = c(3, 3), mar=c(4, 4, 1, 1))
} else if(nparam*nmetric <= 16){ par(mfrow = c(4, 4), mar=c(4, 4, 1, 1))
} else if(nparam*nmetric <= 25){ par(mfrow = c(5,5), mar=c(4, 4, 1, 1)) }
for ( iparam in 1:nparam ) {
for (j in 1:nmetric) {
ymin=tObs[j]-sqrt(tObsErr[j])
ymax=tObs[j]+sqrt(tObsErr[j])
plot(tData[,iparam],tData[,nparam+j+1],col=2,xlab=cands[iparam],ylab=metric[j],
ylim=range(tData[,nparam+j+1], ymin, ymax))
abline(h=c(tObs[j], ymin, ymax), col = 'blue', lty =2, lwd=c(1,3,3))
}
}
We proceed to building a GP emulator. Please review emulate-lmdz.Rmd R Markdown for more details.
listnew <-lapply(1:nmetric,function(k) names(tData[1:nparam]))
myem.lm <-InitialBasisEmulators(tData=tData,HowManyEmulators=nmetric,additionalVariables = listnew)
Perform diagnostics for GP emulator, in particular, the traceplot diagnostics:
traceplot(myem.lm[[1]]$StanModel)
## 'pars' not specified. Showing first 10 parameters by default.
as well as the Leave One Out (LOO) plots
for (i in 1:nmetric) {
tLOOs1 <- LOO.plot(StanEmulator = myem.lm[[i]] , ParamNames=names(tData)[1:nparam])
}
We are ready to proceed to Wave 1 history macthing. We are interested to search for vaues of the model input parameters that lead to `close enough’ models (as defined by our uncertainties). In order to rule out implausible regions of input space, we adopt the implausibility function \(\mathcal{I}(\boldsymbol{x})\) defined as
\[\mathcal{I}(\boldsymbol{x})=\frac{\vert z-E[f(\boldsymbol{x})]\vert}{\sqrt{Var[e] + Var[\eta] + Var[f(\boldsymbol{x})]}}\] where \(z\) corresponds to tObs, \(Var[e]\) corresponds to tObsErr. The values of \(E[f(\boldsymbol{x})]\) and \(Var[f(\boldsymbol{x})]\) are generated by using our emulator at unseen \(\boldsymbol{x}\in\mathcal{X}\). Currently, the value of \(Var[\eta]\) (variance of model discrepancy) has to be pre-specified by the user.
We are interested to obtain the Not Ruled Out Yet (NROY) space, which is the subsetof parameter space, \(\mathcal{X}\), for which \(\mathcal{I}(\boldsymbol{x})\leq a\), where \(a\) (cutoff) is a pre-specified threshold.
Mathematically, NROY space is defined as
\[\mathcal{X}^1=\Big\{\boldsymbol{x}\in\mathcal{X}: \mathcal{I}(\boldsymbol{x})\leq a \Big\}.\] We start by generating 10,000 points random LHC (Latin Hypercube), stored in data frame Xp, to represent the parameter space \(\mathcal{X}\), at which we are planning to compute the implausibility function defined previously.
Xp <- as.data.frame(2*randomLHS(10000, nparam)-1)
names(Xp) <- param
head(Xp)
## thermals_fact_epsilon thermals_betalpha thermals_detr_min
## 1 0.6253043 0.57184361 0.2285519
## 2 0.7255172 -0.04879562 -0.5273907
## 3 -0.7355718 0.39670634 0.3745444
## 4 0.9068556 0.11254546 -0.9434288
## 5 0.4690120 0.65720895 0.2504248
## 6 -0.4225239 -0.23933420 0.7917741
## thermals_entr_min thermals_afact thermals_ed_dz
## 1 -0.6343077 -0.4899556 -0.3267835
## 2 0.4062689 -0.4586031 -0.3943953
## 3 -0.1606490 0.7170593 -0.8616941
## 4 -0.8125726 0.4866033 0.6553166
## 5 -0.1031205 0.2115441 0.8636980
## 6 0.8266682 0.9030579 -0.7621309
Before we proceed to performing history matching we (our collaborators) need to specify the threshold value \(a\) (cutoff) together with variance of model discrepancy \(Var[\eta]\) (Disc).
Disc = 0
cutoff = 3
Let’s obtain the implausibility function values at \(\boldsymbol{x}\in\mathcal{X}\) (Xp):
Timps <- ManyImplausibilitiesStan(NewData=Xp, Emulator=myem.lm, Discrepancy=Disc,
Obs=tObs, ObsErr=tObsErr, is.GP=NULL,FastVersion = TRUE)
head(Timps)
## [,1]
## [1,] 25.051769
## [2,] 25.195377
## [3,] 10.648213
## [4,] 26.784709
## [5,] 31.225073
## [6,] 4.608641
We are interested in visualizing the results of history matching by generating the implausibility plots.
Firstly, we need to generate the implausibility lists using function CreateImpList. For function CreateImpList we need to specify ImpData, a data frame that contains the input points, Xp, and the corresponding values of implausibility function evaluated at Xp.
ImpData_wave1 = cbind(Xp, Timps)
VarNames <- names(Xp)
valmax = 1
ImpListM1 = CreateImpList(whichVars = 1:nparam, VarNames=VarNames, ImpData=ImpData_wave1, nEms=length(myem.lm), whichMax=valmax)
imp.layoutm11(ImpListM1,VarNames,VariableDensity=FALSE,newPDF=FALSE,the.title=paste("InputSpace_wave",WAVEN,".pdf",sep=""),newPNG=FALSE,newJPEG=FALSE,newEPS=FALSE)
We managed to produce the NROY density and minimum implausibility plots for 2D projections of the parameters. Each panel on the upper triangle shows the proportion of parameter settings behind each pixel that are NROY. Grey regions are completely ruled out.
The lower triangle shows minimum implausibility plots. We plot the value of the smallest implausibility found in each pixel. For comparative purposes the plots have the same orientation as those on the upper triangle. Green and yellow areas of this plot would indicate the location of potentially `good’ settings of model parameters in case when the emulator performs well. For a second Wave we would look to further explore the green and yellow areas of these plots as these areas correspond to NROY space after Wave 1.
By performing a single Wave of HM we have managed to cut out parameter space and achieve an NROY space of size
NROY1 = which(Timps < cutoff)
print(paste('The size of NROY space after Wave 1 HM', length(NROY1)/dim(Xp)[1]*100, "% of original input space"),
sep = " ")
## [1] "The size of NROY space after Wave 1 HM 12.08 % of original input space"
Post-processing History Matching results:
XpNext = Xp[NROY1, ]
NROY.list = list()
Impl.list = list()
Impl.list[[1]] = matrix(Timps, ncol = 1)
NROY.list[[1]] = matrix(NROY1, ncol = 1)
History matching method is most powerful when refocussing steps are taken, in particular within NROY space a new ensemble is run and the procedure is repeated.
Mathematically, NROY space in wave \(k\) is defined as
\[\mathcal{X}^k=\big\{\boldsymbol{x}\in\mathcal{X}^{k-1}:\mathcal{I}(\boldsymbol{x}; \boldsymbol{F}_{[k]})\leq a\big\}\] where \(\mathcal{I}(\boldsymbol{x}; \boldsymbol{F}_{[k]})\) is computed by employing an emulator for \(f(\boldsymbol{x})\) defined inside \(\mathcal{X}^{k-1}\) and obtained based on the design \[\boldsymbol{X}_{[k]}=\big(\boldsymbol{x}_{k,1}, \dots, \boldsymbol{x}_{k,n} \big)^T\in\mathcal{X}^{k-1}\] and the computer model evaluations \[\boldsymbol{F}_{[k]}=\big(f(\boldsymbol{x}_{k,1}), \dots, f(\boldsymbol{x}_{k,n}) \big)^T\].
We continue with loading the data file lmdzAWave2.RData that contains model simulations for Wave 2.
load('../Demonstrations/lmdzAWave2.RData')
head(tData)
## thermals_fact_epsilon thermals_betalpha thermals_detr_min
## 7211 -0.8485162 0.3595744 0.7235801
## 5477 -0.7545752 0.7662795 -0.2299203
## 4452 -0.7299809 0.0905942 -0.7836830
## 795 -0.3296033 0.2316680 0.4715638
## 8620 -0.8844568 0.2094111 -0.9308307
## 9297 -0.7758901 -0.5702894 0.8020564
## thermals_entr_min thermals_afact thermals_ed_dz Noise SCMdata
## 7211 0.2932945 -0.7705251 -0.5295738 -0.3167133 305.3808
## 5477 -0.8063424 0.7925850 -0.0585598 -0.4739830 305.3916
## 4452 0.3134712 -0.9384577 -0.4350848 0.4848220 305.3444
## 795 -0.8291360 0.2793945 -0.5868343 0.4710571 305.3753
## 8620 -0.4115217 -0.6832195 -0.5410491 0.4028520 305.4609
## 9297 0.8219182 -0.4770951 -0.5897296 -0.1712895 305.4330
Similarly to Wave 1, we produce scatter plots
if(nparam*nmetric<2){ par(mfrow = c(1, 1), mar=c(4, 4, 1, 1))
} else if(nparam*nmetric <= 4){ par(mfrow = c(2, 2), mar=c(4, 4, 1, 1))
} else if(nparam*nmetric <= 9){ par(mfrow = c(3, 3), mar=c(4, 4, 1, 1))
} else if(nparam*nmetric <= 16){ par(mfrow = c(4, 4), mar=c(4, 4, 1, 1))
} else if(nparam*nmetric <= 25){ par(mfrow = c(5,5), mar=c(4, 4, 1, 1)) }
for ( iparam in 1:nparam ) {
for (j in 1:nmetric) {
ymin=tObs[j]-sqrt(tObsErr[j])
ymax=tObs[j]+sqrt(tObsErr[j])
plot(tData[,iparam],tData[,nparam+j+1],col=2,xlab=cands[iparam],ylab=metric[j],
ylim=range(tData[,nparam+j+1], ymin, ymax), xlim = c(-1, 1))
abline(h=c(tObs[j], ymin, ymax), col = 'blue', lty =2, lwd=c(1,3,3))
}
}
We proceed to constructing a GP emulator.
listnew <-lapply(1:nmetric,function(k) names(tData[1:nparam]))
myem.lm2 <-InitialBasisEmulators(tData=tData,HowManyEmulators=nmetric,additionalVariables = listnew)
Prior to using a newly obtained GP emulator for history matching, it is important to validate its performance, using a traceplot:
traceplot(myem.lm2[[1]]$StanModel)
## 'pars' not specified. Showing first 10 parameters by default.
as well as the Leave One Out (LOO) plots
for (i in 1:nmetric) {
tLOOs2 <- LOO.plot(StanEmulator = myem.lm2[[i]] , ParamNames=names(tData)[1:nparam])
}
Before performing second Wave of History Matching we could redefine some of the parameters such as a threshold value \(a\) (cutoff). Let’s compute an implausibility values at the reduced input space \(\mathcal{X}^1\) XpNext.
cutoff = 2
Timps2 = ManyImplausibilitiesStan(NewData=XpNext, Emulator=myem.lm2, Discrepancy=Disc,
Obs=tObs, ObsErr=tObsErr, is.GP=NULL,FastVersion = TRUE)
head(Timps)
## [,1]
## [1,] 25.051769
## [2,] 25.195377
## [3,] 10.648213
## [4,] 26.784709
## [5,] 31.225073
## [6,] 4.608641
As before we are interested in visualizing the result of history matching for Wave 2 by generating the implausibility plots.
Firstly, we need to generate the implausibility lists using function CreateImpListWaveM. For function CreateImpListWaveM we need to specify ImpData that contains the implausibility calculations produced at Wave 1 and Wave 2.
Impl.list[[2]] = matrix(Timps2, ncol = 1)
NROY.list[[2]] = matrix(which(Timps2 < cutoff), ncol = 1)
ImpData <- ImpDataWaveM(Xp, NROY.list, Impl.list)
ImpListM2 <- CreateImpListWaveM(whichVars = 1:nparam, VarNames=VarNames, ImpData = ImpData,
nEms=nmetric, Resolution=c(15,15), whichMax=1)
imp.layoutm11(ImpListM2,VarNames,VariableDensity=FALSE,newPDF=FALSE,the.title=paste("InputSpace_wave",2,".pdf",sep=""),newPNG=FALSE,newJPEG=FALSE,newEPS=FALSE)
By performing a second Wave of HM we have managed to cut out parameter space and achieve an NROY space of size
NROY2 = which(Timps2 < cutoff)
print(paste('The size of NROY space after Wave 2 HM', length(NROY2)/dim(Xp)[1]*100, "% of original input space"),
sep = " ")
## [1] "The size of NROY space after Wave 2 HM 6.61 % of original input space"